Objectives

  • Define Tufte’s theory of data graphics, data-ink ratio, and chartjunk
  • Present and compare examples of minimalistic graphics to their original form
  • Assess visualizations under the engineer/designers philosophy
library(tidyverse)
library(ggthemes)
library(knitr)
library(broom)
library(stringr)

options(digits = 3)
set.seed(1234)

Tufte’s world

  • Core purpose of visualization is to communicate quantitative information
    • Art is secondary
    • “Above all else show the data”
  • Goal is to maximize the data-ink ratio

    \[\text{Data-ink ratio} = \frac{\text{data-ink}}{\text{total ink used to print the graphic}}\]

  • Data-ink - non-erasable core of a graphic
    • This is what Tufte says we should most care about
    • Minimize all extraneous fluff
  • What should we consider to be part of the “data-ink”?
    • Is this literally just the data? Don’t we need gridlines or axes? What else can be considered integral to the graph?
  • He never offers proof of his hypothesis that less is better

What is integral?

p <- ggplot(mpg, aes(cty, hwy)) +
  geom_point()
p

  • Data points
  • Axis ticks
  • Axis tick labels
  • Axis labels
  • Background
  • Grid lines

What happens if we strip away everything except the data?

p + theme_void()

Hmm, so what do we actually need to keep? What should we consider “integral”? What if we remove the background color?

p + theme_bw()

  • Remove panel box
p + theme_bw() +
    theme(panel.background = element_blank(),
          panel.border = element_blank())

  • Remove minor grid lines
p + theme_bw() +
    theme(panel.background = element_blank(),
          panel.border = element_blank(),
          strip.background = element_blank(),
          plot.background = element_blank(),
          axis.line = element_blank(),
          panel.grid.minor = element_blank())

  • Remove all grid lines
p + theme_bw() +
    theme(panel.background = element_blank(),
          panel.border = element_blank(),
          strip.background = element_blank(),
          plot.background = element_blank(),
          axis.line = element_blank(),
          panel.grid = element_blank())

  • Remove tick marks
p + theme_bw() +
    theme(panel.background = element_blank(),
          panel.border = element_blank(),
          strip.background = element_blank(),
          plot.background = element_blank(),
          axis.line = element_blank(),
          panel.grid = element_blank(),
          axis.ticks = element_blank())

  • Use serif font
p + theme_bw(base_family = "serif") +
    theme(panel.background = element_blank(),
          panel.border = element_blank(),
          strip.background = element_blank(),
          plot.background = element_blank(),
          axis.line = element_blank(),
          panel.grid = element_blank(),
          axis.ticks = element_blank())

What have we lost? Is this easier to interpret? Harder?

Chart junk

  • Vibrating moire effects
    • Hard to produce in ggplot2 - no support for them
    • Eye junk
    • Makes the graph harder to decode/interpret
  • The grid
    • Minimize/reduce the thickness of grid lines to ease interpretation
    • Less visual clutter to weed through
    • Add some compare/contrast with ggplot
  • The duck

  • Tufte concludes that forgoing chartjunk enables functionality and insight (as Cairo would describe it). Do you agree?

Compare Tufte minimal graphs to traditional graphs using ggplot2

  • ggthemes
  • Compare other themes for the same basic plot1

The goal of Tufte’s minimalism is to maximize the data-ink ratio, so we want to modify traditional or default graphs in R and ggplot2 to minimize use of extraneous ink.

Minimal line plot

x <- 1967:1977
y <- c(0.5, 1.8, 4.6, 5.3, 5.3, 5.7, 5.4, 5, 5.5, 6, 5)
d <- data_frame(x, y)

ggplot(d, aes(x, y)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(breaks = seq(1, 6, 1), label = sprintf("$%s", seq(300, 400, 20))) +
  labs(title = "Per capita budget expandures",
       x = "Year",
       y = "Per capita budget expandures\nin constant dollars")

  • We use geom_point() to draw the data points and geom_line() to connect the points
  • What is the extraneous ink on this graph?
    • Background
    • Title of graph and y-axis labels - redundant
    • x-axis label - year is obvious/self-explanatory
  • Missing context - how is this expansion meaningful over time?
ggplot(d, aes(x, y)) +
  geom_line() +
  geom_point(size = 3) +
  theme_tufte(base_size = 15) +
  theme(axis.title = element_blank()) +
  geom_hline(yintercept = c(5, 6), lty = 2) +
  scale_y_continuous(breaks = seq(1, 6, 1), label = sprintf("$%s", seq(300, 400, 20))) +
  scale_x_continuous(breaks = x, label = x) +
  annotate(
    "text",
    x = c(1977, 1977.2),
    y = c(1.5, 5.5),
    adj = 1,
    family = "serif",
    label = c("Per capita\nbudget expandures\nin constant dollars", "5%")
  )

  • Remove axis and graph titles
  • Adds text annotation within graph
  • Highlights a 5% increase in per capital expandures
  • Changes font to be more aesthetically pleasing, not so blockish

Minimal boxplot

ggplot(quakes, aes(factor(mag), stations)) +
  geom_boxplot() +
  labs(title = "Fiji earthquakes",
       x = "Richter Magnitude",
       y = "Number of stations reporting earthquakes")

  • Key features of a boxplot
    • Lines to indicate:
      • Maximum of IQR
      • 3rd quartile
      • Median
      • 1st quartile
      • Minimum of IQR
    • Dots for outliers
  • How many different line strokes do we use?
    • 8 for each graph
    • \(8 \times 22 = 176\)
  • This is extraneous ink
ggplot(quakes, aes(factor(mag), stations)) +
  theme_tufte() +
  geom_tufteboxplot(outlier.colour = "transparent") +
  theme(axis.title = element_blank()) +
  annotate(
    "text",
    x = 8,
    y = 120,
    adj = 1,
    family = "serif",
    label = c(
      "Number of stations \nreporting Richter Magnitude\nof Fiji earthquakes (n=1000)"
    )
  )

  • Now we use only 22 verticals to show the same data. It could easily be drawn by hand with a single vertical for each category on the x-axis
  • Doesn’t show outlier info, but is this really necessary?
  • Also removes the background color and gridlines
ggplot(quakes, aes(factor(mag), stations)) +
  theme_tufte() +
  geom_tufteboxplot(median.type = "line") +
  theme(axis.title = element_blank()) +
  annotate(
    "text",
    x = 8,
    y = 120,
    adj = 1,
    family = "serif",
    label = c(
      "Number of stations \nreporting Richter Magnitude\nof Fiji earthquakes (n=1000)"
    )
  )

  • Here we use offsetting lines to indicate the middle half of the data rather than using a gap
  • Is this prettier? Easier to interpret?

Minimal barchart

library(psych)
library(reshape2)

d <- melt(colMeans(msq[, c(2, 7, 34, 36, 42, 43, 46, 55, 68)], na.rm = T) *
            10)
d$trait <- rownames(d)

p <- ggplot(d, aes(x = trait, y = value)) +
  geom_bar(stat = "identity") +
  scale_y_continuous(breaks = seq(1, 5, 1)) +
  labs(title = "Watson et al., 1998",
       subtitle = "N = 3896",
       x = "Negative emotion traits",
       y = "Average score")
p

  • Again, the background is the main culprit - though ggplot2 offers other default backgrounds
p + theme_bw()

p + theme_dark()

p + theme_classic()

p + theme_minimal()

p + theme_void()

ggplot(d, aes(x = trait, y = value)) +
  theme_tufte(base_size = 14, ticks = F) +
  geom_bar(width = 0.25, fill = "gray", stat = "identity") +
  theme(axis.title = element_blank()) +
  scale_y_continuous(breaks = seq(1, 5, 1)) +
  geom_hline(yintercept = seq(1, 5, 1),
             col = "white",
             lwd = 1) +
  annotate(
    "text",
    x = 3.5,
    y = 5,
    adj = 1,
    family = "serif",
    label = c(
      "Average scores\non negative emotion traits
from 3896 participants\n(Watson et al., 1988)"
    )
  )

  • Erases the box/grid background
  • Removes vertical axis
  • Use a white grid to show coordinate lines through the absence of ink, rather than adding ink
    • Allows us to remove tick marks as well

Range-frame scatterplot

ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  xlab("Car weight (lb/1000)") +
  ylab("Miles per gallon of fuel")

  • A standard bivariate scatterplot
ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  geom_rangeframe() +
  theme_tufte() +
  xlab("Car weight (lb/1000)") +
  ylab("Miles per gallon of fuel") +
  theme(axis.title.x = element_text(vjust = -0.5),
        axis.title.y = element_text(vjust = 1.5))

  • Use the frame/axis lines of the graph to communicate important information
    • Extends only to minimum/maximum values in the data, rather than arbitrary points
    • Explicitly identifies the minimum and maximum values

With a quartile plot

library(devtools)
source_url('https://raw.githubusercontent.com/bearloga/Quartile-frame-Scatterplot/master/qfplot.R')
## SHA-1 hash of file is fe88d63ea7111be1a61ea5d36df1bb9c196fba73
qfplot(
  x = mtcars$wt,
  y = mtcars$mpg,
  xlab = "Car weight (lb/1000)",
  ylab = "Miles per gallon of fuel"
)

  • Combine with info on the quartiles of the data to show case this info as well
  • Thicker bar indicates inner two quartiles
  • Median is explicitly labeled

Reconsidering Tufte

When is redundancy better?

Traditional bar chart of crime in the city of San Francisco, 2009-10. Source: Visualizing Time with the Double-Time Bar Chart

Traditional bar chart of crime in the city of San Francisco, 2009-10. Source: Visualizing Time with the Double-Time Bar Chart

Double-time bar chart of crime in the city of San Francisco, 2009-10. Source: Visualizing Time with the Double-Time Bar Chart

Double-time bar chart of crime in the city of San Francisco, 2009-10. Source: Visualizing Time with the Double-Time Bar Chart

  • Each set of 24 bars show the same data. The top bars run from midnight to 11pm. The bottom bars run from noon to 11am.
  • Highlighted regions represent 6-5 (6am-5pm; 6pm-5am)
  • Colors represent (roughly) day and night (yellow for day, blue for night)
  • Enables representing trends over a 24 hour period without breaking arbitrarily at midnight

  • The second graph is incredibly redundant, but which is easier to interpret?
    • Does it pass Tufte’s test?
    • What does it mean to be “integral”?

Does minimalism really help here?

  • Accompanying an article declaring that student progress on NAEP tests has come to a virtual standstill
  • What is the comparison we should make?
  • Is this too much color?
  • Meets Tufte’s minimalist standards, probably has a decent data-ink ratio
  • Note Grade 4 math scores for whites in 2009-2015 - does this mean no progress or unknown scores?
  • This version is much clearer - specifically tells us how to compare the scores
  • Removes color as a channel, using linetype instead
    • In this situation, is that better or worse?
  • Title for the graph makes clear the point tryingg to be made

Experimental tests of Tufte’s claims

  • How do we know Tufte’s claims are true? We can test them with experiments!

Protocol

  • Compared chartjunk versions of graphs to standard/minimalist versions of graphs
  • Tested individuals on chart description and recall
  • 20 subjects split into short and long-term recall groups
    • Quite a small sample of convenience (university population)
  • Collected measures
    • Response scores - did the individual correctly read/interpret the chart?
    • Preferences - which type of chart did the individual prefer? Standard or embellished?
    • Gaze data - where did the subject look during the experiment? At data regions or embellishment regions?

Results

  • No difference for description
  • No difference for immediate recall
  • Embellished images slightly better for long-term recall (12-22 days after treatment)

Discussing the results

  • Why did the chartjunk not lead to worse description and recall?
    • Chartjunk was related to the topic of the chart
    • “Gets to the point quicker”
  • Why would the embellished images produce better long-term recall?
    • Very vivid image
    • Value message - individual believes author is trying to communicate a set of values
    • Embellished images produced more value messages
  • Should visualizations be “objective”?
    • Tufte seems to think so: minimalism leads to the data speaking for itself - do we buy this?

Rethinking Tufte’s definition of visual excellence

  • Too many of Tufte’s claims are based on nothing - no evidence to support his minimalistic approach to graphical design
  • Think of the hockey stick chart vs. xkcd on Earth’s temperature
    • According to Tufte, both probably have a lot of chartjunk (xkcd more so)
    • But if asked to remember the importance and story of the graph weeks later, which one do you think the average reader would recall better?

Testing this theory

  • Design an experiment to test the impact/effectiveness of chartjunk vs. minimalism
  • What protocols/features could we use? How could we deploy the experiment?
    • If deployed on a platform such as Amazon MTurk, what are the benefits and drawbacks?

Statistical graphics vs infographics

Different goals

Statistical graphics

  • Facilitate an understanding of problems in an applied problem
  • Direct readers to specific information
  • Allow readers to see for themselves

Infographics

  • Attractive
  • Grab one’s attention
  • Tell a story
  • Encourage viewer to think about a particular dataset

Assessing the visualization

Wordles/wordclouds

library(twitteR)
## 
## Attaching package: 'twitteR'
## The following objects are masked from 'package:dplyr':
## 
##     id, location
library(tidytext)

# setup API authentication
setup_twitter_oauth(getOption("twitter_api_key"),
                    getOption("twitter_api_token"))
## [1] "Using browser based authentication"
# get fresh trump tweets
trump <- userTimeline("realDonaldTrump", n = 3200) %>%
  twListToDF %>%
  as_tibble

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"   # custom regular expression to tokenize tweets

# tokenize
trump_token <- trump %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))
library(wordcloud)
## Loading required package: RColorBrewer
library(RColorBrewer)

pal <- brewer.pal(8, "Set1")

trump_token %>%
  count(word) %>%
  with(wordcloud(word, n, max.words = 100, scale = c(2,.5),
       colors = pal))

  • What does this graph do well?
  • How does it advance the goals (or not) of Tufte/Gelman/Unwin?
  • How does it advance the goals (or not) of Nigel Holmes?
# get tweets
pope <- userTimeline("Pontifex", n = 3200) %>%
  twListToDF %>%
  as_tibble

# tokenize
pope_token <- pope %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))
library(reshape2)

bind_rows(Trump = trump_token, Pope = pope_token, .id = "person") %>%
  count(word, person) %>%
  acast(word ~ person, value.var = "n", fill = 0) %>%
  comparison.cloud(max.words = 100, colors = c("blue", "red"))

The size of a word’s text is in proportion to its frequency within its category (i.e. proportion of all Trump tweets or all pope tweets). We can use this visualization to see the most frequent words/hashtags by President Trump and Pope Francis, but the sizes of the words are not comparable across sentiments.

  • What does this graph do well?
  • How does it advance the goals (or not) of Tufte/Gelman/Unwin?
  • How does it advance the goals (or not) of Nigel Holmes?
bind_rows(Trump = trump_token, Pope = pope_token, .id = "person") %>%
  count(word, person) %>%
  spread(person, n, fill = 0) %>%
  ggplot(aes(Pope, Trump, label = word)) +
  geom_text() +
  geom_abline(slope = 1, linetype = 2)

Health-care spending and life expectancy

  • Original
  • Revised

  • What are the advantages/disadvantages to the first plot?
  • What are the advantages/disadvantages to the second plot?

Session Info

devtools::session_info()
## Session info --------------------------------------------------------------
##  setting  value                       
##  version  R version 3.3.3 (2017-03-06)
##  system   x86_64, darwin13.4.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       America/Chicago             
##  date     2017-04-10
## Packages ------------------------------------------------------------------
##  package      * version  date       source         
##  assertthat     0.1      2013-12-06 CRAN (R 3.3.0) 
##  backports      1.0.5    2017-01-18 CRAN (R 3.3.2) 
##  bit            1.1-12   2014-04-09 CRAN (R 3.3.0) 
##  bit64          0.9-5    2015-07-05 CRAN (R 3.3.0) 
##  broom        * 0.4.2    2017-02-13 CRAN (R 3.3.2) 
##  codetools      0.2-15   2016-10-05 CRAN (R 3.3.3) 
##  colorspace     1.3-2    2016-12-14 CRAN (R 3.3.2) 
##  curl           2.3      2016-11-24 CRAN (R 3.3.2) 
##  DBI            0.6      2017-03-09 CRAN (R 3.3.3) 
##  devtools     * 1.12.0   2016-06-24 CRAN (R 3.3.0) 
##  digest         0.6.12   2017-01-27 CRAN (R 3.3.2) 
##  dplyr        * 0.5.0    2016-06-24 CRAN (R 3.3.0) 
##  evaluate       0.10     2016-10-11 CRAN (R 3.3.0) 
##  forcats        0.2.0    2017-01-23 CRAN (R 3.3.2) 
##  foreign        0.8-67   2016-09-13 CRAN (R 3.3.3) 
##  ggplot2      * 2.2.1    2016-12-30 CRAN (R 3.3.2) 
##  ggthemes     * 3.4.0    2017-02-19 CRAN (R 3.3.2) 
##  gtable         0.2.0    2016-02-26 CRAN (R 3.3.0) 
##  haven          1.0.0    2016-09-23 cran (@1.0.0)  
##  hms            0.3      2016-11-22 CRAN (R 3.3.2) 
##  htmltools      0.3.5    2016-03-21 CRAN (R 3.3.0) 
##  httr           1.2.1    2016-07-03 CRAN (R 3.3.0) 
##  janeaustenr    0.1.4    2016-10-26 CRAN (R 3.3.0) 
##  jsonlite       1.3      2017-02-28 CRAN (R 3.3.2) 
##  knitr        * 1.15.1   2016-11-22 cran (@1.15.1) 
##  labeling       0.3      2014-08-23 CRAN (R 3.3.0) 
##  lattice        0.20-34  2016-09-06 CRAN (R 3.3.3) 
##  lazyeval       0.2.0    2016-06-12 CRAN (R 3.3.0) 
##  lubridate      1.6.0    2016-09-13 CRAN (R 3.3.0) 
##  magrittr       1.5      2014-11-22 CRAN (R 3.3.0) 
##  Matrix         1.2-8    2017-01-20 CRAN (R 3.3.3) 
##  memoise        1.0.0    2016-01-29 CRAN (R 3.3.0) 
##  mnormt         1.5-5    2016-10-15 CRAN (R 3.3.0) 
##  modelr         0.1.0    2016-08-31 CRAN (R 3.3.0) 
##  munsell        0.4.3    2016-02-13 CRAN (R 3.3.0) 
##  nlme           3.1-131  2017-02-06 CRAN (R 3.3.3) 
##  openssl        0.9.6    2016-12-31 CRAN (R 3.3.2) 
##  plyr           1.8.4    2016-06-08 CRAN (R 3.3.0) 
##  psych        * 1.7.3.21 2017-03-22 CRAN (R 3.3.2) 
##  purrr        * 0.2.2    2016-06-18 CRAN (R 3.3.0) 
##  R6             2.2.0    2016-10-05 CRAN (R 3.3.0) 
##  RColorBrewer * 1.1-2    2014-12-07 CRAN (R 3.3.0) 
##  Rcpp           0.12.10  2017-03-19 cran (@0.12.10)
##  readr        * 1.1.0    2017-03-22 cran (@1.1.0)  
##  readxl         0.1.1    2016-03-28 CRAN (R 3.3.0) 
##  reshape2     * 1.4.2    2016-10-22 CRAN (R 3.3.0) 
##  rjson          0.2.15   2014-11-03 cran (@0.2.15) 
##  rmarkdown      1.3      2016-12-21 CRAN (R 3.3.2) 
##  rprojroot      1.2      2017-01-16 CRAN (R 3.3.2) 
##  rvest          0.3.2    2016-06-17 CRAN (R 3.3.0) 
##  scales         0.4.1    2016-11-09 CRAN (R 3.3.1) 
##  slam           0.1-40   2016-12-01 CRAN (R 3.3.2) 
##  SnowballC      0.5.1    2014-08-09 cran (@0.5.1)  
##  stringi        1.1.2    2016-10-01 CRAN (R 3.3.0) 
##  stringr      * 1.2.0    2017-02-18 CRAN (R 3.3.2) 
##  tibble       * 1.2      2016-08-26 cran (@1.2)    
##  tidyr        * 0.6.1    2017-01-10 CRAN (R 3.3.2) 
##  tidytext     * 0.1.2    2016-10-28 CRAN (R 3.3.0) 
##  tidyverse    * 1.1.1    2017-01-27 CRAN (R 3.3.2) 
##  tokenizers     0.1.4    2016-08-29 CRAN (R 3.3.0) 
##  twitteR      * 1.1.9    2015-07-29 CRAN (R 3.3.0) 
##  withr          1.0.2    2016-06-20 CRAN (R 3.3.0) 
##  wordcloud    * 2.5      2014-06-13 CRAN (R 3.3.0) 
##  xml2           1.1.1    2017-01-24 CRAN (R 3.3.2) 
##  yaml           2.1.14   2016-11-12 cran (@2.1.14)

  1. Source for examples: Tufte in R